Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: outlook ".msg" file converter #196

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

makermotion
Copy link

This pull request introduces a new feature to the application: the ability to convert Outlook .msg files into markdown format by extracting email metadata and content. The implementation adds a new class, OutlookMsgConverter, which extends the DocumentConverter base class.


Key Features
Email Metadata Extraction:
Extracts and includes key headers like From, To, Subject in the markdown output.

Email Body Conversion:
Reads the email body content and formats it into markdown.

Robust Encoding Support:
Attempts to decode content in UTF-16 first, falling back to UTF-8 and handling edge cases to ensure accurate conversion.


Implementation Details
File Validation:
Checks the file extension (.msg) before proceeding with the conversion.

Stream Parsing:
Uses olefile to parse the .msg file structure, extracting streams for headers and body content.

Error Handling:
Includes comprehensive exception handling to manage invalid files or unexpected errors during the conversion process.

Please review the implementation and provide feedback or suggestions for improvement.

@makermotion
Copy link
Author

@microsoft-github-policy-service agree

@l-lumin
Copy link
Contributor

l-lumin commented Dec 22, 2024

could you add tests?

@makermotion
Copy link
Author

added the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants